Assortative model for social networks 
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In this paper we present a new version of a network growth model, generalized in order to describe 
the behavior of social networks. The case of study considered is the preprint archive at cul.arxiv.org. 
Each node corresponds to a scientist, and a link is present whenever two authors wrote a paper 
together. This graph is a nice example of degree-assortative network, that is to say a network where 
sites with similar degree are connected each other. The model presented is one of the few able to 
reproduce such behavior, giving some insight on the microscopic dynamics at the basis of the graph 
structure. 

PACS numbers: 05.40.-a, 64.60.-1, 87.10.+e 



Networks [11 13 are present in different phenomena. 
The Internet |3, U is a graph composed by different com- 
puters, connected by cables; the WWW is a graph 
composed by HTML documents connected by hyperlinks, 
even social structures can be described as graphs. 

In the latter case the nodes are individuals connected by 
different relationships. Even if the degree probability dis- 
tribution P{k) (i.e. the frequency to find a number k of 
links per node) is very often scale-free (i.e. P{k) oc k^''), 
other quantities allow to distinguish between the various 
cases. For such purpose, one of the most interesting is 
the assortativity by degree. Assortativity can be defined 
as the tendency for nodes in a social network to form 
connections preferentially to others similar to them|^. 
This mechanism has been proposed as the key ingredi- 
ent for the formation of communities in networks j 1 Ct 1 1 lj| . 
Using this quantity, it is possible to distinguish the tech- 
nological networks, where instead, the behavior is rather 
degree-disassortative, so that vertices tend to be linked 
to others different from them. Despite the relative sim- 
plicity of such behavior few modelsOEj0| of network 
growth are able to reproduce the formation of communi- 
ties and no one explains the difference between social and 
technological networks. 

Here we analyze a specific case of social network, 
namely the ArXiv:cond-mat repository of preprints at 
cul.arxiv.gov collected by Mark NewmanQ- The nodes 
are the authors of the various papers and a link is present 
between them whenever they wrote at least one paper to- 
gether. We are able to reproduce most of the features of 
such network by a suitable modification of a model pre- 
sented in Ref.[l3- The quantities we measured in the 
real data and in the model are the degree probability dis- 
tribution, the degree correlation between neighbor sites, 
the clustering and the site hetweenness probability distri- 
bution. A summary of the results is reported in Tab.l. 

The degree is the number of links per node. As ex- 
pected, the degree probability distribution of the cond- 
mat data show a power law behavior of the kind P{k) cx 
k^'^ with 7 = 3 (see diamonds in Fig.l). 

We then measure the degree correlation between nodes. 
This is done by introducing the quantity Knn{k), giv- 



ing the average degree of the site neighbors of one site 
whose degree is k. Knn increases if nodes are correlated 
by degree (assortative networks). It decreases if they 
are anti-correlated (disassortative networks). It is flat if 
they are uncorrelated (for example, in the BA model 16,]). 
Knn in the data has an increasing trend, consistent with 
our expectation for an assortative network. A power law 
seems to be an appropriate fit in the region of growth 
Knn{k) oc fc"^ where (j) is about 0.2 (See diamonds in Fig. 
2). Another measure of assortativity we considered is the 
assortativity coefficient r. A complete definition of this 
quantity can be found in ref. 17], here we can say that 
it is proportional to the connected degree-degree corre- 
lation function. In this paper we find that both r and 
<f> have the same behaviour by varying the parameters of 
the model. We therefore focus our analysis only on the 

Clustering coefficient Ci for every site i gives the prob- 
ability that two nearest neighbors of vertex i are also 
neighbors each other, cc(fc), is the average clustering 
coefficient for sites whose degree is k, and it measures 
the tendency to form cliques where each nearest neigh- 
bor of a node (with degree k) is connected to each other. 
In real networks this usually decreases with a power-law 
cc{k) oc k'^ {i/j = —0.8 for the data we analyzed) because 
hubs tend to play the role of connections between sepa- 
rate clusters in the graph, i.e. clusters that have few other 
interconnections than the ones passing through the hub. 
Then the high degree node tends to have low clustering 
coefficient. 

The betweenness bi of a vertex i gives the probability 
that the site i is in the path between two other vertices 
in the graph. Therefore it might be interpreted as the 
amount of the role played by the vertex i in social relation 
between two persons j and k. This quantity behaves as a 
power law both in its distribution P{b) oc {rj = 2.2) 
and in dependence upon k. Analogously to the clustering 
case we defined the average betweenness b{k) for vertices 
whose degree is k. From Fig. 3 we find b(k) oc k^ with 
e = 1.81. 

The model we defined in order to reproduce the data is 
inspired to the preferential attachment onef6l|. The main 



2 



variation consists in allowing growth by addition of new 
links between old nodes. More particularly at every step 
of growth: 

1 . with probability p a new node is wired to an exist- 
ing one; the choice of the destination node is left to 
Barabasi- Albert preferential attachment rule ('rich 
gets richer'). Thus the probability of adding a new 
node and connecting it to an old node i is 



P 



(1) 



2. with probability (1 — p) a new edge is added (if ab- 
sent) between two existing nodes. These are cho- 
sen on the basis of their degree. In other words, 
the probability of adding an edge between node 1 
and node 2 is a P{ki,k2)- This can be written as 
Pi{ki)P2{k2\ki), being the second factor a condi- 
tioned probability. Pi(fci) is the rule for choosing 
the first of the two nodes, and again it is deter- 
mined by the preferential attachment. The func- 
tional form of P2{k2\ki) can be chosen so as to fa- 
vor links between similar or different degree. In 
this way, the probability of adding a new edge and 
connecting two old non-linked nodes is 



ki 
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In the limit of p = 1 the model reduces to a traditional 
BA tree. In order to reproduce the assortative behav- 
ior we have explored two different functional forms: an 
inverse dependence 



P2{k2\ki 
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and an exponential dependence, which clearly has a 
stronger effect 



P2ik2\ki) oc e" 
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Results of simulations for the various values of p are 
summarized in Tab.l, where the fitted exponents of the 
distributions and the global quantities describing the net- 
works are reported. As p grows from 0.1 to 1.0 the change 
in the statistical properties is consistent with the rough 
estimate for the degree distribution exponent given in 
Ref.El 
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As p tends to 1.0, the exponent approaches the value 
3 of the BA model. A radically different behavior ap- 
pears in the exponential case. While for high p we still 
have scale-free distribution, as p decreases a structure in 
k emerges. Two regimes become visible: a power-law 



distribution for low k and a peaked distribution for high 
k. 

Similar behavior is evident for all the quantities de- 
pending on k. The transition happens around p = 0.5. 
This behavior can be explained as follows. Edges are 
added mainly between high degree nodes because of the 
'preferential attachment option' adopted in the choice of 
the first vertex. Moreover, the strong assortativity de- 
riving from the exponential form imposes an high degree 
to the second node as well. Therefore, when the 'wiring 
component' of the growth prevails (p below 0.5), a cluster 
of hubs appears. Their degrees are sharply distributed 
around a high value. Thus a strong assortativity can 
break up the self-similar structure of the graph, superim- 
posing a distribution with a typical scale on the scale- free 
one. This highlights the typical aspect of an assortative 
network, where the hubs (highly connected nodes) con- 
nect with other hubs, generating a core-periphery struc- 
ture. This structure is emphasized in the exponential 
case, where assortativity becomes so large to induce a 
phase transition from a scale-free graph to a network with 
a characteristic scale for high degrees. 

The slope of Knn{k) grows as the assortativity is in- 
creased, moving from the inverse to the exponential form, 
and reducing the value of p. The slight inversion in the 
growth of the exponent visible at small p can be explained 
as a finite size effect, highlighted by the intense assorta- 
tivity for very low values of the parameter p. The BA 
limit is visible as well, being the distribution roughly flat 
for p = 1.0. By measuring and r we note that their 
trends, as the parameters change, are analogous. Rea- 
sonably enough, we can conclude that, at least for our 
model, the exponent and the coefficient carry the same 
information. 

The clustering coefficient distribution versus the degree 
fails to reproduce the real trends. These are usually de- 
creasing with a power-law; the model, instead, generates 
increasing trends. We fit them with a power law with 
positive exponent. We can explain qualitatively such in- 
congruence by taking into account high degree vertices. 
In real networks hubs tend to play the role of connections 
between separate clusters in the graph, with few links 
between each other (apart from the ones attached to the 
hub). Therefore this nodes tend to have low clustering 
coefficient. In our model, on the other hand, all the hubs 
are aggregated together. Thus, even producing an assor- 
tative network it cannot reproduce a network with cc{k) 
decreasing with k. We comment that such behavior in 
the real data is due to the different areas of expertise of 
various authors, such that the most productive scientists 
in one discipline do not collaborate with the top scientists 
of other disciplines within cond-mat. Imposing such sep- 
aration on the hubs produced by the model reproduces 
the correct behavior of data (or rather analyzing the data 
by dividing the papers according to the fields). 

As regards the betweenness, b{k) is an increasing func- 
tion of k (hubs are crucial in the exchange of informa- 
tion). On the other hand its slope decreases as p is re- 
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duced. In a tree like structure {p = 1.0), hubs are play 
the role of bottlenecks for the flow of information between 
separate parts of the networks. Therefore, they have very 
high site betweenness. Approaching to a core-periphery 
structure, each node of the core becomes approximately 
as good as the others in performing this job. Therefore 
the site betweenness of high degree nodes decreases. 

The site betweenness distribution P{b) or is plotted 
after integration in Fig. 3. We obtain a power-law with 
an exponent not depending significantly on p. Its aver- 
aged value is 2.0, that is equal to the measured value 
for a BA tree [l^. It is interesting to notice that also 
here a characteristic scale appears at high values of the 
site betweenness. This is visible in the bump that dis- 
torts the scale free nature of the integrated distribution. 
Notice that we would see a similar distorted trend if we 
integrated the degree distribution. 

In ref.'l95 the following scaling relation is demon- 
strated for the BA model 

& cx (6) 

Thus, the exponent of the site betweenness plotted versus 
k is related to the previous two by the equality 

£ = (7-l)/(^-l) (7) 

This relation stands for disassortative and not assortative 
networks, while deviations are shown for assortative ones 
in ref.[23|. By computing this difference we noticed a 
slightly growing trend, as p is decreased, giving further 
evidence that assortativity breaks the scaling relation. 

The qualitative agreement between the distribution of 
the real data and the simulation shows that our model is 
able to catch the basic aspects of the real graph, with the 
only above mentioned exception of the clustering coeffi- 
cient versus k. A quantitative comparison suggests that 
the exponential form is too strong to describe existing 
networks. In fact, the appearance of a characteristic-scale 
structure like the one foreseen in our model has not been 



observed in any of the real assortative networks studied 
until now. One must notice as well the slight difference in 
the exponents of the site betweenness distribution (2.0 for 
the simulation and 2.2 for cond-mat). Following ref.'Tsj, 
networks should be divided in two classes of universality 
according to the exponent of their site betweenness dis- 
tribution. In fact this seems to assume always one of the 
two values 2.0 and 2.2. Co-authorship networks fall in 
the second class. Therefore, if the hypothesis of ref.fl^ 
were confirmed, our model would fail guess the correct 
universality class for the networks that it is thought to 
represent. However, this would be reasonable, since the 
model can be reduced to a BA tree, which falls in the 
first class. 

In conclusion, we have studied a generalized graph 
growth model, where by tuning a parameter p, it is possi- 
ble to weight the role of growing (addition of new nodes) 
and mixing (addition of new edges) in the microscopical 
behavior of the network. The assortativity can be con- 
trolled as well by fixing a functional form for the wiring 
probability. Macroscopic characteristics of the network, 
i.e. statistical distributions, have been derived by simula- 
tions in the assortative case. The results reveal the effects 
of assortativity on the topology of a network, that can be 
as dramatical as a phase transition. Moreover, the simu- 
lation succeed in reproducing most of the features of real 
assortative networks. Future work could focus on many 
aspects: new nodes could be added carrying 2 edges in- 
stead of one, in order to have a BA graph rather than 
a BA tree in the p = 1.0 limit; the rate of addition of 
new nodes and of new links could be measured for real 
networks to have a fine tuning of the parameter p; more 
general functional forms for the wiring could be investi- 
gated, and even the preferential attachment choice could 
be changed, in order to have a significant wiring also for 
low degree nodes. Further extensions are possible be- 
cause of the rich flexibility of the model. 

We thank the FET Open project IST-2001-33555 
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FIG. 2: Average nearest neighbour degree versus k in the 
inverse and exponential case, and for cond-mat. In the ex- 
ponential case a structure at high k is visible for low p. For 
cond-mat distribution, a maximal and a minimal slope can be 
defined. 



FIG. 1: Degree distribution in the inverse case. The slope 
increases momotonically as p grows from 0.1 to 1.0. The dis- 
tribution for cond-mat is reported for comparison. In the 
inset, degree distribution in the exponential case. As p be- 
comes smaller than 0.5 a peaked structure at high degrees 
appears. 
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FIG. 3: Integrated site betweenness distribution in the inverse 
and exponential case, and for cond-niat. As p tends to 1.0 
the branching in the graphs increases. Given a branch of n 
nodes, &„ starting from the leaves is proportional to {N — 1), 
2{N - 2), 4:{N - 3),..2("-i^(Ar - n). Consequently, in a tree- 
like structure the site betweenness is quantized. This appears 
in the distribution as a succession of power law distributed 
spikes (stairs in the integrated distribution). For small p, a 
bump is visible, signalling a characteristic scale. In the inset, 
b versus k in the inverse and exponential case, and for cond- 
mat. In the exponential case a structure at high k is visible 
for low p. 



TABLE I: Results of numerical simulation of the model: ex- 
ponents of the distributions and assortativity coefficient. Last 

row refers to cond-mat co-authorship network. The exponent 
of the site betweenness distribution is not reported since its 
fluctuations around the average value of 2.0 are negligible. 
For cond-mat it is 2.2. p = 2 + and n = \e ~ ^\ The 
error on the figures is always less than 5%. 
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